## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## [1] 1599 13
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
We can see that quality has kind of rating number, which can be converted into a factor
## Factor w/ 6 levels "3","4","5","6",..: 3 3 3 4 3 3 3 5 5 3 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 3: 10
## 1st Qu.: 9.50 4: 53
## Median :10.20 5:681
## Mean :10.42 6:638
## 3rd Qu.:11.10 7:199
## Max. :14.90 8: 18
We have about 1599 observations.From looking at summary, we can see that some variables max value is much higher than the thrid quartile value, which gives us hint about outliers e.g fixed.acidity , volatile.acidity etc. We should check for outliers in our analysis.
Let us also summarise the data for quality, the most important factor.
## rwines$quality: 3
## X fixed.acidity volatile.acidity citric.acid
## Min. : 460.0 Min. : 6.700 Min. :0.4400 Min. :0.0000
## 1st Qu.: 726.5 1st Qu.: 7.150 1st Qu.:0.6475 1st Qu.:0.0050
## Median :1100.0 Median : 7.500 Median :0.8450 Median :0.0350
## Mean :1053.2 Mean : 8.360 Mean :0.8845 Mean :0.1710
## 3rd Qu.:1446.2 3rd Qu.: 9.875 3rd Qu.:1.0100 3rd Qu.:0.3275
## Max. :1506.0 Max. :11.600 Max. :1.5800 Max. :0.6600
## residual.sugar chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :1.200 Min. :0.0610 Min. : 3.0 Min. : 9.0
## 1st Qu.:1.875 1st Qu.:0.0790 1st Qu.: 5.0 1st Qu.:12.5
## Median :2.100 Median :0.0905 Median : 6.0 Median :15.0
## Mean :2.635 Mean :0.1225 Mean :11.0 Mean :24.9
## 3rd Qu.:3.100 3rd Qu.:0.1430 3rd Qu.:14.5 3rd Qu.:42.5
## Max. :5.700 Max. :0.2670 Max. :34.0 Max. :49.0
## density pH sulphates alcohol
## Min. :0.9947 Min. :3.160 Min. :0.4000 Min. : 8.400
## 1st Qu.:0.9961 1st Qu.:3.312 1st Qu.:0.5125 1st Qu.: 9.725
## Median :0.9976 Median :3.390 Median :0.5450 Median : 9.925
## Mean :0.9975 Mean :3.398 Mean :0.5700 Mean : 9.955
## 3rd Qu.:0.9988 3rd Qu.:3.495 3rd Qu.:0.6150 3rd Qu.:10.575
## Max. :1.0008 Max. :3.630 Max. :0.8600 Max. :11.000
## quality
## 3:10
## 4: 0
## 5: 0
## 6: 0
## 7: 0
## 8: 0
## --------------------------------------------------------
## rwines$quality: 4
## X fixed.acidity volatile.acidity citric.acid
## Min. : 19 Min. : 4.600 Min. :0.230 Min. :0.0000
## 1st Qu.: 262 1st Qu.: 6.800 1st Qu.:0.530 1st Qu.:0.0300
## Median : 831 Median : 7.500 Median :0.670 Median :0.0900
## Mean : 797 Mean : 7.779 Mean :0.694 Mean :0.1742
## 3rd Qu.:1262 3rd Qu.: 8.400 3rd Qu.:0.870 3rd Qu.:0.2700
## Max. :1522 Max. :12.500 Max. :1.130 Max. :1.0000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 1.300 Min. :0.04500 Min. : 3.00
## 1st Qu.: 1.900 1st Qu.:0.06700 1st Qu.: 6.00
## Median : 2.100 Median :0.08000 Median :11.00
## Mean : 2.694 Mean :0.09068 Mean :12.26
## 3rd Qu.: 2.800 3rd Qu.:0.08900 3rd Qu.:15.00
## Max. :12.900 Max. :0.61000 Max. :41.00
## total.sulfur.dioxide density pH sulphates
## Min. : 7.00 Min. :0.9934 Min. :2.740 Min. :0.3300
## 1st Qu.: 14.00 1st Qu.:0.9957 1st Qu.:3.300 1st Qu.:0.4900
## Median : 26.00 Median :0.9965 Median :3.370 Median :0.5600
## Mean : 36.25 Mean :0.9965 Mean :3.382 Mean :0.5964
## 3rd Qu.: 49.00 3rd Qu.:0.9974 3rd Qu.:3.500 3rd Qu.:0.6000
## Max. :119.00 Max. :1.0010 Max. :3.900 Max. :2.0000
## alcohol quality
## Min. : 9.00 3: 0
## 1st Qu.: 9.60 4:53
## Median :10.00 5: 0
## Mean :10.27 6: 0
## 3rd Qu.:11.00 7: 0
## Max. :13.10 8: 0
## --------------------------------------------------------
## rwines$quality: 5
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 5.000 Min. :0.180 Min. :0.0000
## 1st Qu.: 298 1st Qu.: 7.100 1st Qu.:0.460 1st Qu.:0.0900
## Median : 713 Median : 7.800 Median :0.580 Median :0.2300
## Mean : 742 Mean : 8.167 Mean :0.577 Mean :0.2437
## 3rd Qu.:1189 3rd Qu.: 8.900 3rd Qu.:0.670 3rd Qu.:0.3600
## Max. :1598 Max. :15.900 Max. :1.330 Max. :0.7900
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 1.200 Min. :0.03900 Min. : 3.00
## 1st Qu.: 1.900 1st Qu.:0.07400 1st Qu.: 9.00
## Median : 2.200 Median :0.08100 Median :15.00
## Mean : 2.529 Mean :0.09274 Mean :16.98
## 3rd Qu.: 2.600 3rd Qu.:0.09400 3rd Qu.:23.00
## Max. :15.500 Max. :0.61100 Max. :68.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9926 Min. :2.880 Min. :0.370
## 1st Qu.: 26.00 1st Qu.:0.9962 1st Qu.:3.200 1st Qu.:0.530
## Median : 47.00 Median :0.9970 Median :3.300 Median :0.580
## Mean : 56.51 Mean :0.9971 Mean :3.305 Mean :0.621
## 3rd Qu.: 84.00 3rd Qu.:0.9979 3rd Qu.:3.400 3rd Qu.:0.660
## Max. :155.00 Max. :1.0031 Max. :3.740 Max. :1.980
## alcohol quality
## Min. : 8.5 3: 0
## 1st Qu.: 9.4 4: 0
## Median : 9.7 5:681
## Mean : 9.9 6: 0
## 3rd Qu.:10.2 7: 0
## Max. :14.9 8: 0
## --------------------------------------------------------
## rwines$quality: 6
## X fixed.acidity volatile.acidity citric.acid
## Min. : 4.0 Min. : 4.700 Min. :0.1600 Min. :0.0000
## 1st Qu.: 443.0 1st Qu.: 7.000 1st Qu.:0.3800 1st Qu.:0.0900
## Median : 882.5 Median : 7.900 Median :0.4900 Median :0.2600
## Mean : 847.4 Mean : 8.347 Mean :0.4975 Mean :0.2738
## 3rd Qu.:1224.8 3rd Qu.: 9.400 3rd Qu.:0.6000 3rd Qu.:0.4300
## Max. :1599.0 Max. :14.300 Max. :1.0400 Max. :0.7800
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.03400 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.06825 1st Qu.: 8.00
## Median : 2.200 Median :0.07800 Median :14.00
## Mean : 2.477 Mean :0.08496 Mean :15.71
## 3rd Qu.: 2.500 3rd Qu.:0.08800 3rd Qu.:21.00
## Max. :15.400 Max. :0.41500 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.860 Min. :0.4000
## 1st Qu.: 23.00 1st Qu.:0.9954 1st Qu.:3.220 1st Qu.:0.5800
## Median : 35.00 Median :0.9966 Median :3.320 Median :0.6400
## Mean : 40.87 Mean :0.9966 Mean :3.318 Mean :0.6753
## 3rd Qu.: 54.00 3rd Qu.:0.9979 3rd Qu.:3.410 3rd Qu.:0.7500
## Max. :165.00 Max. :1.0037 Max. :4.010 Max. :1.9500
## alcohol quality
## Min. : 8.40 3: 0
## 1st Qu.: 9.80 4: 0
## Median :10.50 5: 0
## Mean :10.63 6:638
## 3rd Qu.:11.30 7: 0
## Max. :14.00 8: 0
## --------------------------------------------------------
## rwines$quality: 7
## X fixed.acidity volatile.acidity citric.acid
## Min. : 8.0 Min. : 4.900 Min. :0.1200 Min. :0.0000
## 1st Qu.: 490.5 1st Qu.: 7.400 1st Qu.:0.3000 1st Qu.:0.3050
## Median : 941.0 Median : 8.800 Median :0.3700 Median :0.4000
## Mean : 832.2 Mean : 8.872 Mean :0.4039 Mean :0.3752
## 3rd Qu.:1081.0 3rd Qu.:10.100 3rd Qu.:0.4850 3rd Qu.:0.4900
## Max. :1585.0 Max. :15.600 Max. :0.9150 Max. :0.7600
## residual.sugar chlorides free.sulfur.dioxide
## Min. :1.200 Min. :0.01200 Min. : 3.00
## 1st Qu.:2.000 1st Qu.:0.06200 1st Qu.: 6.00
## Median :2.300 Median :0.07300 Median :11.00
## Mean :2.721 Mean :0.07659 Mean :14.05
## 3rd Qu.:2.750 3rd Qu.:0.08700 3rd Qu.:18.00
## Max. :8.900 Max. :0.35800 Max. :54.00
## total.sulfur.dioxide density pH sulphates
## Min. : 7.00 Min. :0.9906 Min. :2.920 Min. :0.3900
## 1st Qu.: 17.50 1st Qu.:0.9948 1st Qu.:3.200 1st Qu.:0.6500
## Median : 27.00 Median :0.9958 Median :3.280 Median :0.7400
## Mean : 35.02 Mean :0.9961 Mean :3.291 Mean :0.7413
## 3rd Qu.: 43.00 3rd Qu.:0.9974 3rd Qu.:3.380 3rd Qu.:0.8300
## Max. :289.00 Max. :1.0032 Max. :3.780 Max. :1.3600
## alcohol quality
## Min. : 9.20 3: 0
## 1st Qu.:10.80 4: 0
## Median :11.50 5: 0
## Mean :11.47 6: 0
## 3rd Qu.:12.10 7:199
## Max. :14.00 8: 0
## --------------------------------------------------------
## rwines$quality: 8
## X fixed.acidity volatile.acidity citric.acid
## Min. : 268.0 Min. : 5.000 Min. :0.2600 Min. :0.0300
## 1st Qu.: 462.5 1st Qu.: 7.250 1st Qu.:0.3350 1st Qu.:0.3025
## Median : 709.0 Median : 8.250 Median :0.3700 Median :0.4200
## Mean : 826.7 Mean : 8.567 Mean :0.4233 Mean :0.3911
## 3rd Qu.:1182.5 3rd Qu.:10.225 3rd Qu.:0.4725 3rd Qu.:0.5300
## Max. :1550.0 Max. :12.600 Max. :0.8500 Max. :0.7200
## residual.sugar chlorides free.sulfur.dioxide
## Min. :1.400 Min. :0.04400 Min. : 3.00
## 1st Qu.:1.800 1st Qu.:0.06200 1st Qu.: 6.00
## Median :2.100 Median :0.07050 Median : 7.50
## Mean :2.578 Mean :0.06844 Mean :13.28
## 3rd Qu.:2.600 3rd Qu.:0.07550 3rd Qu.:16.50
## Max. :6.400 Max. :0.08600 Max. :42.00
## total.sulfur.dioxide density pH sulphates
## Min. :12.00 Min. :0.9908 Min. :2.880 Min. :0.6300
## 1st Qu.:16.00 1st Qu.:0.9942 1st Qu.:3.163 1st Qu.:0.6900
## Median :21.50 Median :0.9949 Median :3.230 Median :0.7400
## Mean :33.44 Mean :0.9952 Mean :3.267 Mean :0.7678
## 3rd Qu.:43.00 3rd Qu.:0.9972 3rd Qu.:3.350 3rd Qu.:0.8200
## Max. :88.00 Max. :0.9988 Max. :3.720 Max. :1.1000
## alcohol quality
## Min. : 9.80 3: 0
## 1st Qu.:11.32 4: 0
## Median :12.15 5: 0
## Mean :12.09 6: 0
## 3rd Qu.:12.88 7: 0
## Max. :14.00 8:18
We see that majority of the wines reported has medium quality 5 or 6. We have very few which have 3,4 and 8.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.60 7.10 7.90 8.32 9.20 15.90
The fixed acidity looks normally distributed. Median lies at 7.90. There is not huge gap between median and mean.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1200 0.3900 0.5200 0.5278 0.6400 1.5800
The volatile.acidity also looks normally distributed. But there is a big differnece between 3rd quartile and the maximum value which is outlier.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.090 0.260 0.271 0.420 1.000
The citric.acid plot doesn’t give us much idea about the shape. Let us try to get us some log transform to see clear peaks.
Now citric.acid looks normally distributed
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 22.00 38.00 46.47 62.00 289.00
The Maximum value (289) is way far away from 3rd qu. value 62. Huge outliers are making the mean away from the median alot.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 7.00 14.00 15.87 21.00 72.00
We can see free.sulfur.dioxide and total.sulfur.dioxide and are skewed to the right, but log transform shows a uniform distribution. We also have huge outliers to the right side.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.900 2.200 2.539 2.600 15.500
We see that residual.sugar and chlorides are normally distributed. The outliers though are really far away on the right side.
Density has a very nicely normal distributed histogram.
Both density and pH are normall distrubuted too.
We see that alcohol is skewed to the right. But after log transform, it is still skewed slightly to the right.
We have data with 1599 rows and 13 variables, where X is only an id. We have quality as factor and then rest of the variables are continous.
Main feature is quality and we would like to see affect of rest of variables on the quality of wine.
We have many variables which describe the chemical composition of the wine. Each one contributes to the quality of the wine. In my opinion if the quantity of certain element is far low or high than the optimum quantity, it affects the quality of wine a lot. We will review that in our search.
No. I will creat, along my analysis, as needed.
Some variables distrubtion was skewed to the right. I sued log transform to get better understanding of distribution.
We also altered the x xcale to focus on the area we are interested and to avoid outliers.
We have seen in description that higher levels of valitle acidity can make the taset of wine unpleasant. That hints us that there is a link between the quality and valitle.acidity.
We see that as the volatile.acidity decreases, the quality of wine increases, as expected.
WE also see from description document “that citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines”. Let us find out how citric.acid affects the quality.
As evident from plots, increase in citric acid does increase the quality of wine. They are possitively correlated.
It looks like the higher the amount of alcohol content in a wine, the better the score it receives, but this effect only appears in wines with a quality of six or more, having the rest similar median values.
We see sulphates are slightly possitive correlated to quality. Not as much as other variables, but there is some.
It seems that there is a negetive relationship between density and quality.
As we know that increased citric.acid will result in lower pH. So we can see here that the high quality wines which have higher citric.acid have lower pH.Althogh it is very light acid so do not know how precise thes values are.
##
## Pearson's product-moment correlation
##
## data: rwines$volatile.acidity and rwines$citric.acid
## t = -26.489, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5856550 -0.5174902
## sample estimates:
## cor
## -0.5524957
We see that there is a good negetive correlation between valitaile.acidity and citric.acid (-0.552)
We can also see from scatter plot that valitile.acidity is negetively corrrelated(-0.552 ) to citric.acid.
##
## Pearson's product-moment correlation
##
## data: rwines$pH and rwines$citric.acid
## t = -25.767, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5756337 -0.5063336
## sample estimates:
## cor
## -0.5419041
We also see that pH and citric acid are negetivly correlated. More concentration of acid will result in more acidity i.g lower pH value.
##
## Pearson's product-moment correlation
##
## data: rwines$density and rwines$alcohol
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5322547 -0.4583061
## sample estimates:
## cor
## -0.4961798
Similarly we see negetive correlation (-0.496) between density and alcohol.
We observed that volatile.acidity, citric.acid, density have direct affect on the quality of wine. The valitile.acidity and density had negetive relationship, but citric.acid had positive relationship.
We ovserved that citric.acid and volatile.acidity are negetively related. This could be reason that high quality wines have more citirc acid and because of that percentage of valatile acids is less and less volatility.
We saw that citric.acid has the strongest relationship. Although we don’t have analysed all possible relationships yet.
WE can see a clear affect that when citric acid increases, the quality increases because it increases the freshness of the wine.
##
## Pearson's product-moment correlation
##
## data: rwines$volatile.acidity and rwines$density
## t = 0.88044, df = 1597, p-value = 0.3788
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02702409 0.07097074
## sample estimates:
## cor
## 0.02202623
We see that there is no relationship (0.022) from scatter plot of volatile.aciddity and density.
##
## Pearson's product-moment correlation
##
## data: rwines$volatile.acidity and rwines$citric.acid
## t = -26.489, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5856550 -0.5174902
## sample estimates:
## cor
## -0.5524957
As we found out eariler that there is a negetive relationship between citric acide and volatile.acidity. It is also evident from scatter plot and the correlation coefficient(-0.552)
##
## Pearson's product-moment correlation
##
## data: rwines$citric.acid and rwines$density
## t = 15.665, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3216809 0.4066925
## sample estimates:
## cor
## 0.3649472
There is a very slightly relationship between citric.acid and density.
As we know that alcohol is very light than water. So high alcohol should mean low density. Let us find out relationship between density and alcohol.
##
## Pearson's product-moment correlation
##
## data: rwines$alcohol and rwines$density
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5322547 -0.4583061
## sample estimates:
## cor
## -0.4961798
And as expected we find out negetive relationship(-0.496).
As expected, the plot shows that high volatile.acidity and low citric.acid are usually an indicator of bad quality
As we have been seeing before, we see that if the citric.acid is higher and volatile.acidity is lower, the wines are higher quality. We also saw that high alocol content results in lower density, which is a property of high quality wines.
We also found that high quality wines ### Were there any interesting or surprising interactions between features? All relationships are as expected.
## 3 4 5 6 7 8
## 10 53 681 638 199 18
A very high number of wines have medium quality (5 or 6). The wines of quality 3 (10),4 (53) and 8 (18) are 5% of the total observations. Around 10% are quality level 7 (199).While remaining 85% is quality 5 (681) and 6 (638). We have zero observation for quality 0,1 and 2. We also don’t have any observations for quality 9 and 10.
As the data is subjective, This might be a problem because we need more data sets for high quality wines to figure out more accurately about what makes a good wine or vice versa.
The highest correlated () box plot. There is a very big differnet between lowest and highest quality wines. The difference between lowest quality (1,2) and (7,8) is not as much high as there is differnece from the medium (5 ,6) quality wines.
We also see some overlaps between the box areas, which means a lot of low quality wines still have same citric acid as highest quality ones.
Three most influential varialbes quality, Volatile Acidity and Citric Acid are shown in this scatter plot. Each quality level is shown in its differnet color. There is negetive correlation (-0.52) between volatile acidity and Citric Acid.
We observe that the lowest quality wines are towards the higher end of the plot. But alow observe that quality 8 is at higher level than level 7. This shows that there might be a correlatin but it is not perfectly linear.
The data consists of 1599 observations. It contains 13 varialbe oout of which 1 is only the id. We also learnt from description that the data is sensory data and quality rating calculated by taking the median of the scores assigned by three or more experts, in a scale of zero to ten. That is the reason why we do not have zero or 10 values.
Because the data is subjective, so we expected some difficulties getting some idea about what factors increase the quality rating of the wine. Although we couldn’t find very strong relationships, but still we found some relationships, which can help to predict quality of wine. E.g Acetic Acid, Ctiric Acid,pH and Alcohol content can tell us about quality of wine somehow.
The analyis could be imporved 1. If we have same amount of observations for all qualities. That will make easier to find relatiionships between different variables. 2. It would also help if we had data for wines with quality 1 and 2 or 9 and 10. That would have helped us to see more clearer trends as the variable might have extremed towards the extreme levels. 3. We als have to get unbiased observational data, which has no effect of color, temperature or age factor of the subjects.